Caseinolytic Proteins (Clp) in the Genus Klebsiella: Special Focus on ClpK

Caseinolytic proteins (Clp), which are present in both prokaryotes and eukaryotes, play a major role in cell protein quality control and survival of bacteria in harsh environmental conditions. Recently, a member of this protein family, ClpK was identified in a pathogenic strain of Klebsiella pneumoniae which was responsible for nosocomial infections. ClpK is linked to the thermal stress survival of this pathogen. The genome wide analysis of Clp proteins in Klebsiella spp. indicates that ClpK is present in only 34% of the investigated strains. This suggests that the uptake of the clpk gene is selective and may only be taken up by a pathogen that needs to survive harsh environmental conditions. In silico analyses and molecular dynamic simulations show that ClpK is mainly α-helical and is highly dynamic. ClpK was successfully expressed and purified to homogeneity using affinity and anion exchange chromatography. Biophysical characterization of ClpK showed that it is predominantly alpha-helical, and this is in agreement with in silico analysis of the protein structure. Furthermore, the purified protein is biologically active and hydrolyses ATP in a concentration- dependent manner.


Introduction
Klebsiella pneumoniae are multi-drug resistant, aerobic, Gram-negative bacteria which belong to the Enterobacteriaceae family and were first isolated and described by Carl Friedlander in 1882 [1,2]. This opportunistic pathogen causes respiratory and urinary tract infections and has also been recognized as the cause of bloodstream infections in immunocompromised patients [2]. Furthermore, Klebsiella species have been found to be amongst the top three leading causes of hospital-acquired infections in the United States, mainly infecting immunocompromised individuals [2][3][4]. The infections caused by this pathogen are either endogenous or acquired through contact with an infected host or through contact with contaminated hospital equipment such as endoscopes [1]. Several members of the Klebsiella species have been found to be resistant to antibiotics such as amino-and carboxypenicillins [2,[5][6][7]. Therefore, the Klebsiella species pose a serious threat especially to immunocompromised patients in hospital environments [2,[5][6][7]. As a result, The ATPase regulatory and catalytic (ClpP) subunits function as a complex to degrade misfolded proteins which cannot be reactivated. (C) Some ATPase regulatory subunits (e.g.: ClpB) do not interact with ClpP [10,11].
To date, 12 Clp regulatory subunits with diverse cellular functions have been identified across various bacterial species (Table 1). These subunits are classified into either Class I or Class II depending on the number of nucleotide binding domains (NBDs) they contain. Class I and Class II members contain two and one NBDs, respectively [11,15,16]. Proteins belonging to class I are relatively large, and their sizes range from 68 kDa to 110 kDa, in contrast, proteins belonging to class II are considerably smaller [17]. The NBDs in Class I both consist of Domain 1 (D1) and Domain 2 (D2) however, the amino acid sequence of each domain differs. The difference in the amino acid sequence in the NBDs suggest that gene fusion, rather than gene duplication, may be the route through which members of class I evolved [14,18]. The NBDs contain canonical Walker A and Walker B which are essential for the breakdown of ATP. Walker A forms the floor of the nucleotidebinding pocket and binds the ATP phosphate. Walker B positions cations to bind metals that play a role in ATP catalysis [14,16,17,19]. To date, 12 Clp regulatory subunits with diverse cellular functions have been identified across various bacterial species (Table 1). These subunits are classified into either Class I or Class II depending on the number of nucleotide binding domains (NBDs) they contain. Class I and Class II members contain two and one NBDs, respectively [11,15,16]. Proteins belonging to class I are relatively large, and their sizes range from 68 kDa to 110 kDa, in contrast, proteins belonging to class II are considerably smaller [17]. The NBDs in Class I both consist of Domain 1 (D1) and Domain 2 (D2) however, the amino acid sequence of each domain differs. The difference in the amino acid sequence in the NBDs suggest that gene fusion, rather than gene duplication, may be the route through which members of class I evolved [14,18]. The NBDs contain canonical Walker A and Walker B which are essential for the breakdown of ATP. Walker A forms the floor of the nucleotide-binding pocket and binds the ATP phosphate. Walker B positions cations to bind metals that play a role in ATP catalysis [14,16,17,19]. Table 1. Clp regulatory subunits identified across different proteins, exhibiting diverse functions.
Clp proteins are emerging as drug targets for various diseases caused by pathogens such as Staphylococcus aureus, Streptococcus pneumoniae, Mycobacterium tuberculosis and Bacillus subtilis due to their role in pathogen survival and pathogenicity [10][11][12]16,26]. It is therefore important to understand the diversity and functions of these proteins to adequately target their pathogenic nature and their survival in diverse environments. The well-studied Clp proteins in Class 1 include ClpA, ClpB and ClpC. However, very little is known about ClpK in terms of protein characteristics. In this study, a bioinformatics approach was used to investigate the presence of Clp regulatory subunits across the Klebsiella species, with a special focus on the presence of ClpK. We further performed in silico studies on the modelled structure of ClpK to gain insight into its structural features. To pave a way for functional and structural studies, we report ClpK expression and purification conditions.

Caseinolytic ATPase Protein Classification
The presence and diversity of Clp regulatory subunits is a continuously studied field in different organisms; however, the presence of Clp proteins in the Klebsiella species has not been studied adequately [14,18,21,22,24,27]. To address this knowledge gap, the presence and diversity of Clp proteins in Klebsiella strains was investigated. It was observed that 98% of the strains studied contained ClpA, with the exception of K. oxytoca CAV1335 and K. variicola KP5-1 ( Figure 2). A similar observation was noted for ClpB, where out of the studied species, only two (K. variicola KP5-1 and K. variicola DX120-E) did not contain ClpB ( Figure 2). With respect to ClpX, K. pneumoniae CAV1217 was the only species that lacked the ClpX gene ( Figure 2). Interestingly, we further observed that unlike ClpA, ClpB and ClpX, ClpK was only found to be present in 34% of the studied species. Additionally, only four out of the seven Klebsiella species contained ClpK with the highest number of ClpK was found within the K. pneumoniae species (Figure 3). contained the clpk gene, suggesting that only certain strains acquired the plasm horizontal transfer to subsequently express the ClpK protein [9]. Interesting observed that none of the studied species contained ClpC. This observation with the findings of Miller et al. (2018), who suggested that ClpC is a common protein of ClpK and ClpA. Therefore inferring that the investigated strains ma tained ClpC at some point, which have now mutated into either ClpA or ClpK on environmental conditions [15].   les 2022, 27, x FOR PEER REVIEW Figure 3. The distribution of ClpK proteins found among the investigated Klebsiella speci total number of ClpK was tallied from data obtained using the NCBI Genome database. T est number of ClpK was found in K. pneumoniae (27 strains), followed by Kp subsp. pneum strains), K. variicola (1 strain) and K. oxytoca (1 strain).

Phylogenetic Analysis
A Clustal and phylogenetic tree analysis was performed to establish the rel between the investigated Klebsiella Clp proteins. Phylogenetic tree analysis is a us nique as it allows for representation of hierarchical of biological data and show tionary relationships between species and how they have evolved over time [28 analysis of the studies proteins was performed to evaluate the percentage identi within each group ( Table 2). The percentage identity gives an estimation of the pe residues that match up amongst proteins of interest [29]. Table 2 shows the high age identity of the compared protein sequences and therefore indicates that . The distribution of ClpK proteins found among the investigated Klebsiella species. The total number of ClpK was tallied from data obtained using the NCBI Genome database. The highest number of ClpK was found in K. pneumoniae (27 strains), followed by Kp subsp. pneumoniae (5 strains), K. variicola (1 strain) and K. oxytoca (1 strain).
Our data shows a considerably low number of ClpK identified among the Klebsiella strains compared to the number of ClpA, ClpB and ClpX proteins. Out of the 57 K. pneumoniae strains which were analyzed, only 27 (47%) contained ClpK (Figures 2 and 3). This was unexpected because ClpK is reported to be ubiquitous and is found in various species such as Escherichia coli, Enterobacter cloacae, and other Klebsiella strains other than Klebsiella pneumoniae [9]. Also, taking into account that the clpk gene is hypothesized to be transferred through horizontal gene transfer, one would expect it to be present among a greater number of the investigated strains [9]. However, the absence of ClpK in a majority of the studied strains may indicate the selective uptake of the clpk gene by pathogens to enable them to survive its current harsh environment. Our findings are similar to those  Bojer, et al. (2010), wherein they reported that only 31 out of the 105 clinical isolates contained the clpk gene, suggesting that only certain strains acquired the plasmid through horizontal transfer to subsequently express the ClpK protein [9]. Interestingly, we also observed that none of the studied species contained ClpC. This observation correlates with the findings of Miller et al. (2018), who suggested that ClpC is a common ancestorial protein of ClpK and ClpA. Therefore inferring that the investigated strains may have contained ClpC at some point, which have now mutated into either ClpA or ClpK depending on environmental conditions [15].

Phylogenetic Analysis
A Clustal and phylogenetic tree analysis was performed to establish the relationship between the investigated Klebsiella Clp proteins. Phylogenetic tree analysis is a useful technique as it allows for representation of hierarchical of biological data and shows evolutionary relationships between species and how they have evolved over time [28]. Clustal analysis of the studies proteins was performed to evaluate the percentage identity shared within each group ( Table 2). The percentage identity gives an estimation of the percentage residues that match up amongst proteins of interest [29]. Table 2 shows the high percentage identity of the compared protein sequences and therefore indicates that these are orthologs [30]. The phylogenetic tree of the investigated Clp proteins is shown in Figure 4. The root of the phylogenetic tree is in the middle with branches radiating out in different directions. The branch of ClpX irradiates directly from the root which is expected because it belongs to Class II, whereas the other three proteins (ClpA, ClpB and ClpK) belong to Class I [17,20,21,23]. The branches of ClpA, ClpB and ClpK connect before divergence, and this suggests that these proteins have all descended from a common ancestor and are therefore termed to be homologs [30]. Figure 4 also shows divergence within the arrangement of the ClpK proteins. Clustal analysis of ClpK proteins namely, K. pneumoniae FDAARGOS 566, K. pneumoniae KPNIH39, K. pneumoniae 2-1, K. pneumoniae WCHKP020098, K. pneumoniae J1, K. pneumoniae FDAAR-GOS 444, K. pneumoniae CAV1417, and Kp. subsp. pneumoniae KPNIH32 from the various branches, showed an identity of 93.48-100%. The varying percentage identity and the visible divergence within ClpK indicates a divergence in the evolution of the protein [15].

Hypothetical ClpK Structure
ClpB (1QVR-B) was identified as an appropriate template to model the structure of ClpK since it had a sequence identity and coverage query of 52% and 83%, respectively ( Figure 5). The percentage query coverage indicates how much of the query sequence is included in the alignment; the higher the query coverage, the better the match [29]. Furthermore, the structure of ClpB was determined at 3.00 Å, and this was the best resolution compared to the resolution of other structures with similar percentage identity. This resolution is considered to be fairly good, as it allows for the visualization of well-defined water molecules and provides a fairly good idea about the shape of the macromolecule [31].

Hypothetical ClpK Structure
ClpB (1QVR-B) was identified as an appropriate template to model the structure of ClpK since it had a sequence identity and coverage query of 52% and 83%, respectively ( Figure 5). The percentage query coverage indicates how much of the query sequence is included in the alignment; the higher the query coverage, the better the match [29]. Furthermore, the structure of ClpB was determined at 3.00 Å, and this was the best resolution compared to the resolution of other structures with similar percentage identity. This resolution is considered to be fairly good, as it allows for the visualization of well-defined water molecules and provides a fairly good idea about the shape of the macromolecule [31].
The alignment of ClpB and ClpK shows five domains namely, the N-terminal domain, D1-large domain, D1-small domain, D2-large domain and D2-small domain all of which are conserved within Class I proteins ( Figure 5). The Walker A motif (GXXXGK[T/S]-X represents any residue) binds to ATP [14][15][16] and is 100% identical in the aligned sequences, suggesting that both these proteins interact with ATP in a similar manner. The Walker B motif (hhhhD[D/E]-h represents hydrophobic residues) binds metals, thus playing a role in ATP hydrolysis [14][15][16], and is 100% identical in NBD1 of the aligned sequence, while it only has 73% identity in NBD2. The non-identical residues in NBD2 may indicate that there are subtle differences in the binding of metals between these proteins.  Figure 5). The Walker A motif (GXXXGK[T/S]-X represents any residue) binds to ATP [14][15][16] and is 100% identical in the aligned sequences, suggesting that both these proteins interact with ATP in a similar manner. The Walker B motif (hhhhD[D/E]-h represents hydrophobic residues) binds metals, thus playing a role in ATP hydrolysis [14][15][16], and is 100% identical in NBD1 of the aligned sequence, while it only has 73% identity in NBD2. The non-identical residues in NBD2 may indicate that there are subtle differences in the binding of metals between these proteins.
The modelled ClpK structure was validated using the ProCheck server and is shown in Figure 6A. ATPases can exist in a monomeric, dimeric and trimeric state in the absence of nucleotides, therefore the hypothesized trimeric structure of ClpK is not alarming [23]. Ramachandran analysis of the trimeric ClpK (90.10%) and template ClpB (83.10%) proteins showed that the majority of their protein residues lie within the most favoured regions ( Figure 7, Table S1). A Rama Z-score of -0.75 ± 0.16 was obtained for the trimeric ClpK structure from the MolProbity server, this value was within the accepted Z-score range. The structural alignment between the monomeric ClpK and ClpB ( Figure 6B) gave a root mean square deviation (RMSD) value of 0.300 Å which is indicative of the two structures adopting a similar conformation [33]. The modelled structure of ClpK is consistent with other known structures of ATPases which contain a mixture of α-helices and β-sheets ( Figures 5 and 6). Furthermore, this structure agrees with the virtual CD data obtained for ClpK using the DichroCalc server and DicroWeb analysis ( Figure S1, Table S2). The spectrum shows that ClpK displays one ellipticity maxima at about 190 nm, and one ellipticity minima at about 220 nm which is characteristic of proteins consisting mainly of α-helices [34]. The modelled ClpK structure was validated using the ProCheck server and is shown in Figure 6A. ATPases can exist in a monomeric, dimeric and trimeric state in the absence of nucleotides, therefore the hypothesized trimeric structure of ClpK is not alarming [23]. Ramachandran analysis of the trimeric ClpK (90.10%) and template ClpB (83.10%) pro- Figure 5. Alignment of ClpK with ClpB. The annotations were done according to sequence ClpB (1QVR-B). The α-helices and β-sheet are shown as rectangles and arrows, respectively. ClpK residues colored blue represent the position of the α-helices whereas red residues represent the position of the β-sheet. Asterisks (*) denote the identical residues, (:) represents sequence homologies, and (.) represents weak similarity. The five domains are shown as follows: N-terminal domain (blue); D1 large domain (red); D1 small domain (green); short linker region (grey); D2 large domain (pink); D2 small domain/C-terminal domain (cyan). The secondary structure elements are colored according to the domain. Walker A and Walker B motifs are highlighted yellow and cyan, respectively. The alignment was performed using T-COFFEE [32]. sistent with other known structures of ATPases which contain a mixture of α-helices and β-sheets ( Figures 5 and 6). Furthermore, this structure agrees with the virtual CD data obtained for ClpK using the DichroCalc server and DicroWeb analysis ( Figure S1, Table  S2). The spectrum shows that ClpK displays one ellipticity maxima at about 190 nm, and one ellipticity minima at about 220 nm which is characteristic of proteins consisting mainly of α-helices [34]. The comparison of ClpK N-terminal domain with other Clp ATPases shows that it contains a 100 amino acid N-terminal extension which is unique to this protein [9]. The role of this N-terminus extension is not known however its presence suggests a possible unique role of the ClpK N-terminus in protein homeostasis. The ClpB N-terminal domain contains a substrate binding groove which is known to recognise hydrophobic residues of unfolded or aggregated proteins [37]. A similar substrate groove was identified in the ClpK N-terminal domain using the DoGSiteScorer ( Figure S2, Table S3). Therefore, it could be hypothesized that the ClpK binding pocket recognizes substrates in a similar manner to ClpB.
The NBD1 and NBD2 domain of ClpK adopts a RecA-like fold characterized by a central β-sheet flanked by α-helices ( Figure S3). This fold is a common structural feature found in most ATPases and assists with the movement of polypeptides into the proteolytic core, which is a critical step in protein proteolysis [21,38]. In most Clp proteins, NBD1 and NBD2 are not separated; however, in ClpK we find that NBD1 and NBD2 are separated by a short linker sequence which adopts a helical structure ( Figures 5 and 6B). In ClpB, this linker region is termed "the middle domain" and is essential for chaperone activity, although its exact function has not been fully established [21]. We have observed that the linker region of ClpK and ClpB differ in amino acid length, with the linker region of ClpB being almost double the size of ClpK. The role of the linker region in ClpK is yet to be established. One could envisage that it may transport a different range of substrates compared to those transported by ClpB.

Molecular Dynamics Simulation
To explore the dynamic behavior and stability of the modelled structure, molecular dynamics (MD) simulations and post-dynamic analyses were carried out. ClpB was used as a control for all the MD simulations and post-dynamic analyses. Figure 8 shows the potential energy profiles to compare the trajectory of the alpha carbons (Cα) within a time frame for ClpK and ClpB. A comparison of the potential energy obtained for ClpK (-401975.3 ± 313.1 kcal/mol) and ClpB (-468132.8 ± 331.6 kcal/mol) shows a slight, insignificant shift, thus indicating that the ClpK structure was adequately modelled. The comparison of ClpK N-terminal domain with other Clp ATPases shows that it contains a 100 amino acid N-terminal extension which is unique to this protein [9]. The role of this N-terminus extension is not known however its presence suggests a possible unique role of the ClpK N-terminus in protein homeostasis. The ClpB N-terminal domain contains a substrate binding groove which is known to recognise hydrophobic residues of unfolded or aggregated proteins [37]. A similar substrate groove was identified in the ClpK N-terminal domain using the DoGSiteScorer ( Figure S2, Table S3). Therefore, it could be hypothesized that the ClpK binding pocket recognizes substrates in a similar manner to ClpB.
The NBD1 and NBD2 domain of ClpK adopts a RecA-like fold characterized by a central β-sheet flanked by α-helices ( Figure S3). This fold is a common structural feature found in most ATPases and assists with the movement of polypeptides into the proteolytic core, which is a critical step in protein proteolysis [21,38]. In most Clp proteins, NBD1 and NBD2 are not separated; however, in ClpK we find that NBD1 and NBD2 are separated by a short linker sequence which adopts a helical structure (Figures 5 and 6B). In ClpB, this linker region is termed "the middle domain" and is essential for chaperone activity, although its exact function has not been fully established [21]. We have observed that the linker region of ClpK and ClpB differ in amino acid length, with the linker region of ClpB being almost double the size of ClpK. The role of the linker region in ClpK is yet to be established. One could envisage that it may transport a different range of substrates compared to those transported by ClpB.

Molecular Dynamics Simulation
To explore the dynamic behavior and stability of the modelled structure, molecular dynamics (MD) simulations and post-dynamic analyses were carried out. ClpB was used as a control for all the MD simulations and post-dynamic analyses. Figure 8 shows the potential energy profiles to compare the trajectory of the alpha carbons (Cα) within a time frame for ClpK and ClpB. A comparison of the potential energy obtained for ClpK (−401,975.3 ± 313.1 kcal/mol) and ClpB (−468,132.8 ± 331.6 kcal/mol) shows a slight, insignificant shift, thus indicating that the ClpK structure was adequately modelled. To assess the dynamic nature of ClpK we calculated the root mean squared deviation (RMSD) and the root mean square fluctuation (RMSF) values. Figure 9A shows the RMSD values of ClpK (7.22 ± 1.52 Å), which increases from 2 Å to 9 Å over 100 ns. A similar increase in RMSD values is observed for ClpB (8.68 ± 2.37 Å), which increases from 2 Å to 11 Å over 100 ns. The increasing RMSD values observed indicate significant conformational changes, which are shown in Figure 9B,C for ClpB and ClpK, respectively. The conformational changes observed over 100 ns are consistent with the dynamic nature of proteins [39,40]. Furthermore, we used RMSF values to identify amino acids in a protein which contribute the most to protein flexibility: the higher the RMSF value the greater the flexibility [34]. The D1 small domain and linker region were identified as regions which contribute to the flexibility of ClpK (3.17±1.73 Å) and ClpB (3.27 ± 2.27 Å) (Figures 5 and  10). The role of the ClpK and ClpB linker regions would have to further investigated to assess whether the motion of the amino acids plays a role in protein function and stability. To assess the dynamic nature of ClpK we calculated the root mean squared deviation (RMSD) and the root mean square fluctuation (RMSF) values. Figure 9A shows the RMSD values of ClpK (7.22 ± 1.52 Å), which increases from 2 Å to 9 Å over 100 ns. A similar increase in RMSD values is observed for ClpB (8.68 ± 2.37 Å), which increases from 2 Å to 11 Å over 100 ns. The increasing RMSD values observed indicate significant conformational changes, which are shown in Figure 9B,C for ClpB and ClpK, respectively. The conformational changes observed over 100 ns are consistent with the dynamic nature of proteins [39,40]. Furthermore, we used RMSF values to identify amino acids in a protein which contribute the most to protein flexibility: the higher the RMSF value the greater the flexibility [34]. The D1 small domain and linker region were identified as regions which contribute to the flexibility of ClpK (3.17 ± 1.73 Å) and ClpB (3.27 ± 2.27 Å) ( Figures 5 and 10). The role of the ClpK and ClpB linker regions would have to further investigated to assess whether the motion of the amino acids plays a role in protein function and stability. To assess the dynamic nature of ClpK we calculated the root mean squared deviation (RMSD) and the root mean square fluctuation (RMSF) values. Figure 9A shows the RMSD values of ClpK (7.22 ± 1.52 Å), which increases from 2 Å to 9 Å over 100 ns. A similar increase in RMSD values is observed for ClpB (8.68 ± 2.37 Å), which increases from 2 Å to 11 Å over 100 ns. The increasing RMSD values observed indicate significant conformational changes, which are shown in Figure 9B,C for ClpB and ClpK, respectively. The conformational changes observed over 100 ns are consistent with the dynamic nature of proteins [39,40]. Furthermore, we used RMSF values to identify amino acids in a protein which contribute the most to protein flexibility: the higher the RMSF value the greater the flexibility [34]. The D1 small domain and linker region were identified as regions which contribute to the flexibility of ClpK (3.17±1.73 Å) and ClpB (3.27 ± 2.27 Å) (Figures 5 and  10). The role of the ClpK and ClpB linker regions would have to further investigated to assess whether the motion of the amino acids plays a role in protein function and stability. ClpK is represented as red and ClpB is represented as blue on the graph and structures. The positions of the peaks are represented on the ClpK and ClpB structures in green. The highest peak for ClpK is seen around residues 410 to 425, while the highest peak for ClpB is seen around residues 379 to 407 ( Figure 10). To assess the regions of flexibility, we add 110 and 4 to the region values obtained for ClpK and ClpB, respectively, as the structures have been modelled from residue 110 and 4. The graph was generated using Excel. The 3D protein structures were generated using PyMOL.
The radius of gyration (Rg) represents the compactness of a structure [41]. The values obtained for ClpK (40.4 ± 3.93 Å) and ClpB (42.37 ± 4.71) indicate that the proteins do not differ much in terms of structure compactness ( Figure 11). Additionally, the features of the gyration profiles of ClpK and ClpB indicate structural transformation, suggesting that the proteins are constantly transforming during simulation [42]. ClpB seems to undergo transformational change at around 10,000 ps, while ClpK only undergoes transformation around 20,000 ps ( Figure 11). The variation in the Rg profiles across the simulation time once again indicates that both these proteins are structurally dynamic (Figures 9 and 11). The highest peak for ClpK is seen around residues 410 to 425, while the highest peak for ClpB is seen around residues 379 to 407. To assess the regions of flexibility, we add 110 and 4 to the region values obtained for ClpK and ClpB, respectively, as the structures have been modelled from residue 110 and 4. The graph was generated using Excel. The 3D protein structures were generated using PyMOL.
The radius of gyration (Rg) represents the compactness of a structure [41]. The values obtained for ClpK (40.4 ± 3.93 Å) and ClpB (42.37 ± 4.71) indicate that the proteins do not differ much in terms of structure compactness ( Figure 11). Additionally, the features of the gyration profiles of ClpK and ClpB indicate structural transformation, suggesting that the proteins are constantly transforming during simulation [42]. ClpB seems to undergo transformational change at around 10,000 ps, while ClpK only undergoes transformation around 20,000 ps ( Figure 11). The variation in the Rg profiles across the simulation time once again indicates that both these proteins are structurally dynamic (Figures 9 and 11).

Protein Disorder Prediction
Following homology modelling and MD simulation, we assessed the structure of ClpK for protein disorders and binding disorders. This allowed us to determine if it was possible to express and purify soluble ClpK as an initial step to protein characterization and protein-drug interaction studies. Disordered protein regions do not adopt a stable confirmation and therefore make protein purification, protein-ligand binding studies and crystallization difficult [43]. Figure 12 shows that less than 40% of the ClpK residues were predicted to be disordered through IUPred2A (red line), suggesting that ClpK can be expressed and purified [43]. IUPred2A predicts some of the disordered protein residues to be situated in the Nterminal domain (3 small peaks), while most of the disordered protein residues are seen in the C-terminal domain ( Figure 12). It has been noted, that proteins with disordered regions carry out important functional roles such as phosphorylation, regulation and protein-DNA binding [43]. Using the Anchor2 server, we observed that a majority of the disordered binding regions were situated in the C-terminal domain ( Figure 12). Further studies could focus on investigating molecules that binds to the C-terminal domain to facilitate the transition from a disordered to ordered state.  Figure S4) was analysed using the IUPred2A server for the presence of ordered and/or disordered regions. The black line represents the threshold; the red line represents the protein disorder prediction (IUPred2A), and the blue line represents the binding disorder prediction (Anchor2).

Time (ps) Radius of gyration (Å)
20,000 40,000 60,000 80,000 100,000 Figure 11. Trajectory analysis showing the Radius of gyration of the Cα of ClpK over 100,000 ps. ClpK is represented as red, and ClpB is represented as blue.

Protein Disorder Prediction
Following homology modelling and MD simulation, we assessed the structure of ClpK for protein disorders and binding disorders. This allowed us to determine if it was possible to express and purify soluble ClpK as an initial step to protein characterization and protein-drug interaction studies. Disordered protein regions do not adopt a stable confirmation and therefore make protein purification, protein-ligand binding studies and crystallization difficult [43]. Figure 12 shows that less than 40% of the ClpK residues were predicted to be disordered through IUPred2A (red line), suggesting that ClpK can be expressed and purified [43]. IUPred2A predicts some of the disordered protein residues to be situated in the N-terminal domain (3 small peaks), while most of the disordered protein residues are seen in the C-terminal domain ( Figure 12). It has been noted, that proteins with disordered regions carry out important functional roles such as phosphorylation, regulation and protein-DNA binding [43]. Using the Anchor2 server, we observed that a majority of the disordered binding regions were situated in the C-terminal domain ( Figure 12). Further studies could focus on investigating molecules that binds to the C-terminal domain to facilitate the transition from a disordered to ordered state. ClpK is represented as red, and ClpB is represented as blue.

Protein Disorder Prediction
Following homology modelling and MD simulation, we assessed the structure of ClpK for protein disorders and binding disorders. This allowed us to determine if it was possible to express and purify soluble ClpK as an initial step to protein characterization and protein-drug interaction studies. Disordered protein regions do not adopt a stable confirmation and therefore make protein purification, protein-ligand binding studies and crystallization difficult [43]. Figure 12 shows that less than 40% of the ClpK residues were predicted to be disordered through IUPred2A (red line), suggesting that ClpK can be expressed and purified [43]. IUPred2A predicts some of the disordered protein residues to be situated in the Nterminal domain (3 small peaks), while most of the disordered protein residues are seen in the C-terminal domain ( Figure 12). It has been noted, that proteins with disordered regions carry out important functional roles such as phosphorylation, regulation and protein-DNA binding [43]. Using the Anchor2 server, we observed that a majority of the disordered binding regions were situated in the C-terminal domain ( Figure 12). Further studies could focus on investigating molecules that binds to the C-terminal domain to facilitate the transition from a disordered to ordered state.  Figure S4) was analysed using the IUPred2A server for the presence of ordered and/or disordered regions. The black line represents the threshold; the red line represents the protein disorder prediction (IUPred2A), and the blue line represents the binding disorder prediction (Anchor2).

Time (ps) Radius of gyration (Å)
20,000 40,000 60,000 80,000 100,000 Figure 12. Protein disorder prediction for ClpK. The ClpK protein sequence ( Figure S4) was analysed using the IUPred2A server for the presence of ordered and/or disordered regions. The black line represents the threshold; the red line represents the protein disorder prediction (IUPred2A), and the blue line represents the binding disorder prediction (Anchor2).

Expression and Purification of ClpK
To our knowledge, the expression and purification of ClpK has not been reported to date. To test the expression of ClpK, E. coli BL21 cells were transformed with pCold-I plasmid containing the clpk gene (ClpK construct). Different expression conditions were tested to determine suitable conditions to express the soluble ClpK protein. Figure 13 shows the successful induction and expression of soluble ClpK using 0.1 mM, 0.25 mM, and 0.5 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG) respectively, as indicated by the protein band corresponding to the molecular weight of ClpK ( Figure 13). Based on the band intensity of expressed ClpK, 0.25 mM IPTG was selected as an optimal concentration for expression (Figure 8, Lane 6).

Expression and Purification of ClpK
To our knowledge, the expression and purification of ClpK has not been reported to date. To test the expression of ClpK, E. coli BL21 cells were transformed with pCold-I plas mid containing the clpk gene (ClpK construct). Different expression conditions were tested to determine suitable conditions to express the soluble ClpK protein. Figure 13 shows th successful induction and expression of soluble ClpK using 0.1 mM, 0.25 mM, and 0.5 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG) respectively, as indicated by the protein band corresponding to the molecular weight of ClpK ( Figure 13). Based on the band in tensity of expressed ClpK, 0.25 mM IPTG was selected as an optimal concentration fo expression (Figure 8, Lane 6). Following expression, ClpK was purified using ion exchange and affinity chroma tography. Initially, the expressed protein was subjected to ion exchange chromatography which is based on the electrostatic interaction between the resin and the protein [44]. A pH 7.4, ClpK (pI: 5.61) is negatively charged; therefore it binds to the positively charged anion exchange resin. The protein bound to the resin was eluted with increasing concen trations of sodium chloride in Buffer B ( Figure 14A). ClpK co-elutes with impurities even at the highest salt concentration: therefore, a second purification step was performed. In addition, the recombinant protein only reached 2.7% purity after ion exchange chroma tography ( Table 3). The eluent from anion exchange was passed through an affinity chro matography column. ClpK was expected to bind to the affinity column resin since it con tains a HisTag, while the contaminating proteins were expected to flow through [44]. A shown in Figure 14B and Table 3, the partially purified ClpK was successfully purified to homogeneity ( Figure 14B, Lane 6 to 10). The specific activity increased from 0,042 units/mg in the supernatant to 11.13 units/mg in the homogenous protein sample (Tabl 3). The specific activity is of importance, as it can be used to determine the purity of protein following a dual purification procedure [45]. Following expression, ClpK was purified using ion exchange and affinity chromatography. Initially, the expressed protein was subjected to ion exchange chromatography, which is based on the electrostatic interaction between the resin and the protein [44]. At pH 7.4, ClpK (pI: 5.61) is negatively charged; therefore it binds to the positively charged anion exchange resin. The protein bound to the resin was eluted with increasing concentrations of sodium chloride in Buffer B ( Figure 14A). ClpK co-elutes with impurities even at the highest salt concentration: therefore, a second purification step was performed. In addition, the recombinant protein only reached 2.7% purity after ion exchange chromatography ( Table 3). The eluent from anion exchange was passed through an affinity chromatography column. ClpK was expected to bind to the affinity column resin since it contains a HisTag, while the contaminating proteins were expected to flow through [44]. As shown in Figure 14B and Table 3, the partially purified ClpK was successfully purified to homogeneity ( Figure 14B, Lane 6 to 10). The specific activity increased from 0.0423 units/mg in the supernatant to 11.13 units/mg in the homogenous protein sample ( Table 3). The specific activity is of importance, as it can be used to determine the purity of a protein following a dual purification procedure [45]. 13.02 1.26 100 a Soluble fraction obtained from 0.77 g of wet weight E. coli cell pellet (from 1 L of bacterial culture); b Protein concentration determined by Bradford assay using BSA as a standard protein [46]. c Calculated using the ATPase assay; the release of phosphate ions is measures as ATP is converted to ADP (A620nm). d Elution collected from the ion exchange column. e Pooled eluant collected from the HisTag column.

Enzyme Activity Assay
ClpK has been identified as an ATPase since it consists of structural motifs associated with the hydrolysis of ATP [14,16,17,19]. To our knowledge, the ability of ClpK to hydrolyse ATP in vitro has not been reported to date. Therefore, we tested the biological activity of the purified protein using an ATPase assay. Figure 15 shows the ATPase activity of ClpK. Enzyme activity increases from 0 ± 0 units/L to 10.43 ± 0.72 units/L as protein concentration increases from 0 mg/mL to 0.006 mg/mL, therefore indicating that ATP hydrolysis is concentration dependent (Figure 15).  a Soluble fraction obtained from 0.77 g of wet weight E. coli cell pellet (from 1 L of bacterial culture); b Protein concentration determined by Bradford assay using BSA as a standard protein [46]. c Calculated using the ATPase assay; the release of phosphate ions is measures as ATP is converted to ADP (A 620nm ). d Elution collected from the ion exchange column. e Pooled eluant collected from the HisTag column.

Enzyme Activity Assay
ClpK has been identified as an ATPase since it consists of structural motifs associated with the hydrolysis of ATP [14,16,17,19]. To our knowledge, the ability of ClpK to hydrolyse ATP in vitro has not been reported to date. Therefore, we tested the biological activity of the purified protein using an ATPase assay. Figure 15 shows the ATPase activity of ClpK. Enzyme activity increases from 0 ± 0 units/L to 10.43 ± 0.72 units/L as protein concentration increases from 0 mg/mL to 0.006 mg/mL, therefore indicating that ATP hydrolysis is concentration dependent (Figure 15).

Far UV-CD Spectroscopy
Assessing the CD data of a protein over the far-UV range (185 nm-250 nm) produces a CD spectrum which is indicative of the fingerprint of the secondary structure of the protein or peptide [47]. Therefore, we used CD spectroscopy to explore the secondary structure of ClpK in the presence and absence of ATP. The CD spectra in the presence and absence of ATP displayed a high α-helical and lower β-structural content ( Figure 16). The CD spectra for ClpK shows a negative minima at 208 ± 1 and 220 ± 1 nm and a maxima at 192 ± 1 nm in the absence of ATP. The CD spectra obtained from practical analysis correlates with the structure used in modelling ClpK (Figures 6 and S1). We observed that in the presence of ATP, the maxima increased to 193 ± 1 nm. There seems to be a slight modification in the secondary structure content upon ATP binding, which is not unusual because ClpK contains two ATP binding sites and is considered to be an ATPase.

Far UV-CD Spectroscopy
Assessing the CD data of a protein over the far-UV range (185-250 nm) produces a CD spectrum which is indicative of the fingerprint of the secondary structure of the protein or peptide [47]. Therefore, we used CD spectroscopy to explore the secondary structure of ClpK in the presence and absence of ATP. The CD spectra in the presence and absence of ATP displayed a high α-helical and lower β-structural content ( Figure 16). The CD spectra for ClpK shows a negative minima at 208 ± 1 and 220 ± 1 nm and a maxima at 192 ± 1 nm in the absence of ATP. The CD spectra obtained from practical analysis correlates with the structure used in modelling ClpK ( Figure 6 and Figure S1). We observed that in the presence of ATP, the maxima increased to 193 ± 1 nm. There seems to be a slight modification in the secondary structure content upon ATP binding, which is not unusual because ClpK contains two ATP binding sites and is considered to be an ATPase.

Far UV-CD Spectroscopy
Assessing the CD data of a protein over the far-UV range (185 nm-250 nm) produ a CD spectrum which is indicative of the fingerprint of the secondary structure of protein or peptide [47]. Therefore, we used CD spectroscopy to explore the second structure of ClpK in the presence and absence of ATP. The CD spectra in the presence absence of ATP displayed a high α-helical and lower β-structural content ( Figure 16). CD spectra for ClpK shows a negative minima at 208 ± 1 and 220 ± 1 nm and a maxim 192 ± 1 nm in the absence of ATP. The CD spectra obtained from practical analysis co lates with the structure used in modelling ClpK (Figures 6 and S1). We observed tha the presence of ATP, the maxima increased to 193 ± 1 nm. There seems to be a slight m ification in the secondary structure content upon ATP binding, which is not unusual cause ClpK contains two ATP binding sites and is considered to be an ATPase.

Extrinsic Fluorescence Spectroscopy
Tertiary structure analysis was carried out using extrinsic fluorescence spectroscopy based on mant-ATP and ANS fluorescence ( Figure 17). Both mant-and ANS are fluorophores that alter their spectral properties depending on the polarity of their environment [48,49]. When mant and ANS bind within a hydrophobic pocket in a protein, an increase in quantum yield with a concomitant maximum emission wavelength (λ max ) is observed [48,49]. This degree of shift in λ max often depends on the degree of hydrophobicity of the environment [48,49]. Tertiary structure analysis was carried out using extrinsic fluorescence spectroscopy based on mant-ATP and ANS fluorescence ( Figure 17). Both mant-and ANS are fluoro phores that alter their spectral properties depending on the polarity of their environmen [48,49]. When mant and ANS bind within a hydrophobic pocket in a protein, an increase in quantum yield with a concomitant maximum emission wavelength (λmax) is observed [48,49]. This degree of shift in λmax often depends on the degree of hydrophobicity of the environment [48,49]. Mant-ATP fluorescence was used to further establish the presence of an ATP binding site ( Figure 17). The spectrum indicates binding of mant-ATP as indicated by an increase in the fluorescence quantum yield at 450 nm compared with free mant-ATP. However there was no shift observed upon the binding of mant-ATP (blue region of the spectrum at about 450 nm. This indicates that the local environment of mant-ATP interaction may not be hydrophobic. This is expected because ATP is highly hydrophilic and will require hydrophilic amino acid side chains and Mg 2+ for interaction. Beyond just interacting with ATP, this piece of information further confirms that the recombinant protein shows activ ity towards ATP. Furthermore, an ANS binding assay was used to observe the presence of hydrophobic patches on ClpK in the presence or absence of ATP. ClpK, in the presence and absence of ATP showed a substantial increase in quantum yield when excited at 390 nm in the presence of ANS when compared to free ANS. There was also a shift in maxi mum emission wavelength from 520 nm (free ANS) to 481 nm and 489 nm for ClpK:ATP and only ClpK, respectively. This is indicative of ANS accessing hydrophobic pockets in ClpK. Mant-ATP fluorescence was used to further establish the presence of an ATP binding site ( Figure 17). The spectrum indicates binding of mant-ATP as indicated by an increase in the fluorescence quantum yield at 450 nm compared with free mant-ATP. However, there was no shift observed upon the binding of mant-ATP (blue region of the spectrum) at about 450 nm. This indicates that the local environment of mant-ATP interaction may not be hydrophobic. This is expected because ATP is highly hydrophilic and will require hydrophilic amino acid side chains and Mg 2+ for interaction. Beyond just interacting with ATP, this piece of information further confirms that the recombinant protein shows activity towards ATP. Furthermore, an ANS binding assay was used to observe the presence of hydrophobic patches on ClpK in the presence or absence of ATP. ClpK, in the presence and absence of ATP showed a substantial increase in quantum yield when excited at 390 nm in the presence of ANS when compared to free ANS. There was also a shift in maximum emission wavelength from 520 nm (free ANS) to 481 nm and 489 nm for ClpK:ATP, and only ClpK, respectively. This is indicative of ANS accessing hydrophobic pockets in ClpK.
In summary, we used bioinformatic analysis to investigate the distribution of Clp proteins, particularly ClpK, across seven Klebsiella species. Our data showed that the distribution of clpk was considerably less compared to that of clpa, clpb and clpx. Additionally, a distinguishing structural feature of ClpK amongst the Class I proteins is a shorter middle domain linking NBD1 and NBD2. With this domain implicated in recognising substrates for unfoldase activity, ClpK could potentially recognise a varying substrate range. Therefore, future studies should investigate the role of the ClpK middle domain.
This study is the first to report the conditions for the expression and purification of ClpK. Furthermore, we characterised the biophysical properties ClpK and demonstrated that purified ClpK is biologically active. This gives researchers an opportunity to work towards developing mechanisms to target the thermal stability of K. pneumonia as an alternative approach to antibiotic therapy.

Species and Databases
The protein genomes of 100 Klebsiella strains (7 species, complete draft) were collected from the National Center for Biotechnology Information (NCBI) Genome database. The 100 strains which were analyzed included: 8 K. aerogenes strains, 1 K. michiganensis strain, 7 K. oxytoca strains, 57 K. pneumoniae strains, 15 K. pneumonia subsp. pneumoniae strains, 3 K. quasipneumoniae strains, and 9 K. variicola strains. The various Klebsiella species, strains, web-links, and references are presented in the supplementary database (Table S3).

Genome Data Mining and Annotation of Caseinolytic Proteins
Clp proteins were mined from different Klebsiella species by firstly, obtaining the protein file of each species. Each file was then individually searched for the presence of Clp proteins. The sequence for each Clp protein was separated from the main file and used for further analysis.

Phylogenetic Analysis of Clp Proteins
Phylogenetic analysis of the Clp proteins was carried out using the method described by Ngcobo, et al. (2020) [50]. Briefly, the protein sequences were aligned using MAFFT v6.864 embedded on the Trex Web Server [51,52]. The alignments were automatically deduced and optimized by the Trex Web server. The file for the best tree was then visualized and colored using the Interactive Tree of Life (iTOL) server [53].

Clp Protein Homology Analysis
The percentage identity between different protein codes within the Clp classes were analyzed using Clustal Omega [54]. Full length Clp protein sequences were subjected to Clustal analysis to obtain the percentage identity amongst the proteins as identity matrix results. The results were then laid out on an Excel spreadsheet for analysis.
The percentage identity between the different Clp classes (ClpA, ClpB, ClpK and ClpX) was analyzed in the above-mentioned way.

Homology Modelling of ClpK
To model the structure of ClpK (Uniprot: E0W6V3), a template search was done on PYHRE [55], Itasser [56][57][58] and NCBI [59]. The template was selected based on high sequence identity, coverage parameters and a high determination resolution. Thus, Thermus thermophilus ClpB (1QVR-B) was selected sharing 52% identity and 83% query coverage with the target protein. The modelled ClpK and template ClpB was submitted to ProCheck for stereochemical analysis using Ramachandran analysis [60]. The MolProbity server was used to obtain the Rama Z-score for the trimeric ClpK structure [61]. The model was then submitted to the Maestro v12.2 molecular modelling algorithm [62]. To pre-process the protein structure, bond orders were assigned, hydrogen atoms were added, zero bond orders to metals and disulphide bonds were created, and water molecules that were 5 Å from heteroatoms were deleted. Additionally, the PROPKA algorithm was used at pH 7.0 to optimize the hydrogen bonding network by sampling the orientation of water molecules. Lastly, the OPLS_2005 force field was used to refine the structure through minimisation. The stereochemistry of the side chains was checked to ensure that no major perturbations have been induced while preparing the structure. The minimized structure was saved as Maestro (.mae) file for subsequent prediction analysis.
The ClpK and ClpB sequences were aligned on TCOFFEE to prepare for modelling [32]. The position of Walker A and Walker B motifs on the template protein was assigned as shown in Lee, et al. (2003) [21]. The final ClpK structure was then visualized using PyMol, and the RMSD value was obtained [54]. The monomeric ClpK and template ClpB structures were processed through the ProteinPlus (https://proteins.plus, accessed on 15 July 2021) to identify the position of binding pockets in each protein structure [63].

Molecular Dynamics Simulation
Molecular dynamics (MD) simulations were carried out on Maestro v12.2 using the implemented GPU-enabled Desmond molecular dynamics simulation engine. The ClpK trimeric protein or the template ClpB protein were saved as PDB files and submitted to the Linux (Ubuntu) desktop server for the Desmond MD simulations studies. The ClpK trimeric protein or ClpB template protein were placed in an orthorhombic box (distance from the box face to the outermost protein atom was set 10 Å, and the box angle was α = β = γ = 90 • ). The volume box containing the ClpK trimeric protein or ClpB template protein was minimized, and counter ions were added to neutralize the system. A total of 0.15 M NaCl was added into the solvent box for physiological conditioning. After solvation and ionization, the system was submitted for the MD simulation. MD simulation is divided into eight distinct stages in which the simulation parameters are specified for each stage. Stages 1-7 are the equilibration, which is made up of short simulation steps, and stage 8 is the final long range 100 ns simulation stage. The type and parameters of the solvated system were detected in stage 1. In stage 2, a 100 ps simulation was carried out using Brownian dynamics under NVT conditions at 10 K with restraints placed on the solute-heavy atoms. Stage 3 involved a 12 ps simulation under NVT conditions at 10 K with restraints on heavy atoms. Stages 4, 6 and 7 (the pocket solvation at stage 5 was skipped) employed short simulation steps (12, 12 and 24 ps, respectively) under NPT conditions (at 10 K and with restraints on heavy atoms for stages 4 and 6). No restraints were placed on heavy atoms at stage 7. The final MD production stage at a constant temperature of 300 K was carried out at stage 8.

Post-Dynamic Analysis
Post dynamic analyses of the trajectories derived from the MD simulation studies were carried out using Schrodinger Maestro v12.2 or the Bio3D R statistical package for comparative analysis of protein structures. Firstly; Simulation Quality Analysis (implemented in Maestro v12.2) was used to analyse the quality of simulations which analyse the average energy, pressure, temperature and volume. Secondly, the Simulation Interaction Diagram algorithm (implemented in Maestro v12.2) was used to analyse the rootmean-square-deviation (RMSD) of the alpha carbon atoms (Cα), the root-mean-square fluctuations (RMSF) of the residues, and to carry out a secondary structure element analysis. Lastly, the Simulation Events Analysis algorithm (implemented in Maestro v12.2) was used to calculate the radius of gyration (Rg) and atomic distance.

Protein Disorder and Circular Dichroism Analysis
The primary amino acid sequence of ClpK was used to predict protein disorder and perform a virtual circular dichroism (CD). The sequence was subjected to analysis on the IUPred2A server for protein disorder prediction [64]. The sequence was also subjected to analysis on DichroCalc which is found on The Hirst Group Home Page (https://comp.chem.nottingham.ac.uk/dichrocalc/, accessed on 28 May 2020) server. The CD graph obtained from DichroCalc was then analyzed for secondary structures both manually and through the DicroWeb server [65].

ClpK Expression and Purification
The cDNA encoding for ClpK (Uniprot: E0W6V3) was extracted from GenBank. The gene was synthesized, codon-optimized and cloned into a pColdI vector using BamH1 and SalH1 by GenScript, Piscataway, New Jersey, United States. The resulting plasmid (pCold1-ClpK) was transformed into Escherichia coli BL21 cells. The E. coli BL21 pColdI-ClpK glycerol stocks (100 µL) were used to inoculate 10 mL lysogeny broth (LB) containing 100 mg/mL ampicillin (10 µL). The cells were grown overnight (16 h) at 37 • C. Cells from the overnight flask (1 mL) were then used to inoculate flasks containing 100 mL LB agar and 100 mg/mL ampicillin (100 µL). The cells were grown to an OD of 0.4 to 0.6 before they were cold-shocked in ice for 30 min. The culture was then induced with 0.25 mM isopropyl-β-D-thiogalactopyranoside (IPTG) and incubated at 15 • C for 24 h. Uninduced cells served as the control and were grown along with the induced cells.
Both uninduced and induced cells were harvested by centrifugation at 6800× g for 15 min. After centrifugation, the resulting pellets were resuspended in binding buffer (20 mM sodium phosphate, 20 mM imidazole, pH 7.4). The cells were then ruptured by sonication (15 min, 30 s on, 20 s off, 50% amplification) followed by centrifugation. Both the soluble and insoluble fractions were analyzed using reducing 12.5% SDS-PAGE [66]. The protein concentration was determined using a Thermofisher NanoDrop 2000 at 280 nm [67].
Purification was carried out using firstly; a 5 mL HiTrap Q HP anion exchange column (GE Healthcare, Chicago, IL, USA), and secondly; a 1 mL HisTrap HP column (GE Healthcare, Chicago, IL, USA). The HiTrap Q HP column was equilibrated with binding buffer. The soluble cell fraction was passed through the anion exchange column. Unbound proteins were removed from the column using binding buffer, and any remaining protein was eluted with Buffer B (20 mM imidazole, 20 mM sodium phosphate, and 0.2 M NaCl, pH 7.4).
Fractions containing the eluted protein were pooled and passed through the HisTrap HP column which had been equilibrated with Buffer B (20 mM imidazole, 20 mM sodium phosphate, and 0.2 M NaCl, pH 7.4). Unbound proteins were removed from the column using Buffer B, and the remaining proteins was eluted with Buffer C (192 mM imidazole, 20 mM sodium phosphate, and 0.5 M NaCl, pH 7.4), the elution samples were collected and analyzed using reducing 12.5% SDS-PAGE gels [66].

Enzyme Activity Assay
ATP hydrolysis was measured using the ATPase/GTPase Activity Assay Kit (Lot# 113BI08A16) from Sigma-Aldrich according to the manufacturers' instructions. Briefly; 0 to 0.006 mg/mL ClpK was mixed with 30 µL of reaction buffer (40 mM Tris, 80 mM NaCl, 8 mM MgAc 2 , 1 mM EDTA, and 4 mM ATP, pH 7.5) and incubated at room temperature for 30 min. The ATPase activity was followed by measuring the release of phosphate ions spectrometrically at 620 nm, as a result of the conversion of ATP to ADP. Measurements were performed in triplicate.

Far-UV Circular Dichroism Spectroscopy
The secondary structure contents of the protein were analysed with far-UV circular dichroism using 2 µM of ClpK in 5 mM sodium phosphate pH 7.4. This was performed both in the absence and presence of 0.2 mM ATP. The Jasco J-1500 spectropolarimeter was used to conduct the experiment at 20 • C using a 2 mm quartz cuvette. Spectra measurements were collected in 5 accumulations from 250 to 180 nm at 2.5 nm band width, 0.2 nm data pitch and 1 s response time. The CD spectra was recorded in millidegrees ellipticity (θ mdeg ) and was later converted to mean residue ellipticity [θ MRE ] (deg·cm 2 ·dmol −1 ). This was calculated using the equation below: where θ mdeg is the signal of measured ellipticity in mdeg, c is the protein concentration measured in mM, n is the number amino acid residues of the protein, and l is the pathlength in cm. The data were analysed and processed using the Dichroweb algorithm by employing the CONTIN parameter.

Extrinsic Fluorescence Spectroscopy
The ATP substrate binding to ClpK was probed using methylanthraniloyl-ATP (mant-ATP). A concentration of 10 µM mant-ATP each was added to the protein and was excited at 355 nm while the excitation and emission band widths were fixed at 2.5 nm. The emission was collected between 400-600 nm wavelengths in three accumulations.
Extrinsic ANS fluorescence was performed to probe for hydrophobic pockets on the protein. A concentration of 200 µM ANS was incubated with the protein for 1 h in the dark, and the samples were excited at 390 nm wavelength. Emission spectra were collected between 400 and −600 nm wavelengths at 5 nm excitation and emission bandwidths using 200 nm/min scanning speed. The experiments were performed using Jasco FP-6300 fluorescence spectrophotometer at 20 • C using a 10 mm quartz cuvette.
All fluorescence samples were prepared with 2 µM protein concentration in the presence and absence of 2 mM ATP using 10 mM sodium phosphate pH 7.4 and 5 mM MgCl2 for ATP-bound samples. Buffer contributions were subtracted from each final spectrum.